There's no 'Count or Predict' but task-based \\selection for distributional models
نویسندگان
چکیده
In this paper, we investigate the differences between prediction-based (word2vec), dense countbased (GloVe) and sparse count-based (JoBimText) semantic models. We evaluate the models, which were selected because they can all be computed efficiently on large data, based on word similarity tasks and a semantic ranking task both for verbs and nouns. We demonstrate that prediction-based models yield higher scores than the other two models at determining a similarity score between two words. To the contrary, sparse count-based methods perform best in the ranking task. Further, sparse count-based methods benefit more from linguistically informed contexts, such as dependency relations. In summary, we highlight differences of popular distributional semantic representations and derive recommendations for their usage.
منابع مشابه
Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a...
متن کاملA corpus-based evaluation method for Distributional Semantic Models
Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...
متن کاملA corpus-based evaluation method for Distributional Semantic Models
Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...
متن کاملبررسی تأثیر پنج عامل شخصیت بر عضویت نوجوانان در فیسبوک
Introduction: Nowadays, Facebook as a social networking site is one of the most popular hobbies of cyberspace among adolescents and young people. Tendency or reluctance to Facebook is determined by personality traits of the user. Method: To investigate the effect of big five personality factors on the membership of adolescents on Facebook, 350 students (175 male students and 175 female student...
متن کاملModeling Semantic Plausibility by Injecting World Knowledge
Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as “man swallow p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017